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Abstract 

Wireless sensor networks are widely adopted in military, civilian and commercial applications, 
which fuels an exponential explosion of sensory data. However, a major challenge to deploy effective 
sensing systems is the presence of massive missing entries, measurement noise, and anomaly readings. 
Existing works assume that sensory data matrices have low-rank structures. This does not hold in 
reality due to anomaly readings, causing serious performance degradation. In this paper, we introduce 
an LS-Decomposition approach for robust sensory data recovery, which decomposes a corrupted data 
matrix as the superposition of a low-rank matrix and a sparse anomaly matrix. First, we prove that LS- 
Decomposition solves a convex program with bounded approximation error. Second, using data sets from 
the IntelLab, GreenOrbs, and NBDC-CTD projects, we hnd that sensory data matrices contain anomaly 
readings. Third, we propose an accelerated proximal gradient algorithm and prove that it approximates 
the optimal solution with convergence rate 0{l/lE) (k is the number of iterations). Evaluations on real 
data sets show that our scheme achieves recovery error < 5% for sampling rate > 50% and almost exact 
recovery for sampling rate > 60%, while state-of-the-art methods have error 10% ^ 15% at sampling 
rate 90%. 
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I. Introduction 

Wireless sensor networks (WSNs) Uj] are constantly generating an enormous amount of rich and 
diverse information. WSNs are widely adopted in military, civilian and commercial applications, such 
as intrusion detection in hattlefields yj], search and rescue systems yO, infrastructure monitoring 1^, 
environment monitoring etc. The increasing number of hig data sources have fueled an exponential 
explosion of sensory data Utilizing such a huge amount of sensory data for information suhstraction 
and interpretation, we are able to quantitatively understand physical phenomena and perform actively 
control over cyber-physical systems (CPS). 

However, a major challenge to deploy effective sensor systems is the presence of massive missing 
entries, measurement noise, and anomaly readings. Missing entries will detriment the usability and 
reliability of the sensory data sets, while measurement noise and anomaly readings will cause erroneous 
conclusions and decisions. The data loss rate in real-world projects can be as high as 5% ~ 95%, as 
shown in Fig. |4] Section V, due to unreliable wireless communications such as poor link quality and packet 
collision. Moreover, measurement noise and anomaly readings are ubiquitous in real-world projects mainly 
due to: (1) commodity sensors have low accuracy; (2) some sensor nodes have malfunctions; and (3) the 
dynamic surroundings may cause disturbances. Therefore, it is necessary to distinguish environmental 
data from anomalies readings and measurement noise in a robust and accurate way. 

Network data are naturally represented as a matrix in which rows denote nodes and columns denote time 
slots. Therefore, many recent works apply matrix completion techniques to perform data recovery [g-la] 
and data collection iloL lllll in WSNs. Compression is performed in the data collection phase to reduce 
communication burden, while matrix completion is applied at the sink node to recover the raw sensory 


data. However, the standard matrix completion il2l4l4ll experiences serious performance degradation with 


the presence of a small portion of anomalies. Fig. [T] shows a simple example, a 100 x 100 matrix with 
rank r = 5 taking values in [0,100], can be exactly recovered using 30% uniformly random selected 
samples jldll . However, if we add anomalies with value = 100, two difference ratios 0.5% and 3%, the 
recovery errors (f' 2 -norm) increase significantly. 

The performance degradation comes from the fact that: even a small portion of anomaly readings can 
break down the low-rank structure. As shown in Fig. |2l adding 0.5% anomalies into the above rank 
r = 5 matrix, the energy captured in the top 10 singular values is 80%, while it is only 53% when 3% 


anomalies are added. In Section V, using data sets collected from the IntelLab il5ll . GreenOrbs 


H, 


and 


NBDC-CTD lllTl] projects, we observe that anomaly readings are common and ubiquitous. Therefore, we 
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Matrix Completion with Anomaly Readings 



Fig. 1. Performance of matrix completion: exactly low-rank matrix, compared with matrices with 0.5% and 3% anomalies. 


find that existing works hastily assume low-rank structures. 

In this paper, we introduce an LS-Decomposition approach for sensory data recovery, which can deal 
with massive data loss and is robust to both measurement noise and anomaly readings. By modeling mea¬ 
surement noise as small entry-wise noise, and anomaly reading as gross sparse error, LS-Decomposition 
decomposes a sensory data matrix into a low-rank matrix and a sparse anomaly matrix. Firstly, we present 
observations on real data sets from the IntelLab ll^ . GreenOrbs and NBDC-CTD projects, 
showing that (1) the original data matrices have massive missing entries and (2) each data matrix is the 
superposition of a low-rank matrix and a sparse anomaly matrix. Secondly, we formulate the robust data 
recovery problem as an optimization problem, coined as LS-Decomposition, and prove that solving a 
convex program achieves bounded approximation. Thirdly, we propose an accelerated proximal gradient 
algorithm for LS-Decomposition, theoretical results guarantee that our algorithm approximates the optimal 
solution with convergence rate 0{l/k‘^) {k is the number of iterations). Finally, evaluations on real data 
sets show that our scheme achieves recovery error < 5% for sampling rate > 50% and almost exact 
recovery for sampling rate > 60%, while state-of-the-art methods have error 10% ~ 15% at sampling 
rate 90%. 

The reminder of the paper is organized as follows. Section II discusses related works. System models 
and problem formulation are given in Section III. We present conditions and theoretic guarantees for 
optimal LS-Decomposition in Section IV. Observations are presented in Section V. Our algorithm is 
in Section VI, while Section VII describes performance evaluation. We conclude in Section VIII. The 
appendix provides detailed proofs. 
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Empirical CDF of Singular Values 



Fig. 2. Empirical CDF of energy captured by the top K singular values: exactly low-rank matrix, compared with matrix with 
0.5% and 3% anomalies. 


II. Related Work 


We first describe the applications of the standard matrix completion in wireless sensor networks. Then, 
we discuss several variants of matrix completion. 

Data loss is revealed to be ubiquitous and unavoidable in sensor networks In order to improve the 
reliability and useability of decisions draw from such incomplete sensory data, a data recovery process 


is needed. Since the raw sensory data are redundant lllSh . it is able to estimate the original environment 


data from partial observations. In authors show that data matrices of temperature, humidity and light 
are low-rank and have high spatiotemporal correlations, while similar empirical observations are also 
presented in 111]. To recover effectively, flSi apply data preprocess on the raw data sets to exclude 
anomaly readings, and then matrix completion to perform data recovery. However, there are no good 
criteria for identifying anomaly readings while la HI are based on researchers’ experience. Furthermore, 
the inter-correlation among multiple attributes are exploited to improve the recovery accuracy 171]. Matrix 
completion is also suitable for seismic data recovery 1^. 

Besides data recovery, matrix completion is also applied to improve data collection llOLIllIl in WSNs. 


Raw-data collection is rather inefficient lll9h . since WSNs are typically composed of hundreds to thousands 


of sensor nodes generating tremendous amount of sensory readings. As the packet loss problem and the hot 
spot problem surface, this approach will lead to a large number of retransmissions in real-world situations 
and node failures as batteries run out. Therefore, researchers apply matrix completion to reduce global 


traffic lIlOll . Compression is performed in the data collection phase to reduce communication burden. 


while matrix completion is applied at the sink node to recover the raw sensory data. Furthermore, the 
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authors of jl fill propose an online data gathering scheme for weather data. 


A natural technique to throve the recovery performance of the standard matrix completion is to add a 
smooth regulation term ialZllSillllj. This intuition is inspired hy the fact that almost all physical conditions 
are smooth fields, i.e., the physical conditions are continuous without sudden changes. However, all the 
above schemes are based on a critical assumption that the matrix of interest exhibits a low-rank structure, 
which does not hold in reality. As shown in Fig. |2j |4] and [5l real-world data matrices violate such 
assumption due to the existence of a small portion of anomaly readings. Therefore, the standard matrix 


1311 and the smoothed counterpart experience serious performance degradation in practical 


completion 11121 
scenarios. 

Noticing the ubiquitousness of anomalies in practical scenarios, researc 


the data matrix into a low-rank matrix and a sparse anomaly matrix 1201 - 


lers propose to decompose 


2211 and prove its universal 


applicability. The authors prove that it is possible to exactly separate those two components. These 
works inspire us in proving that our problem has bounded approximation, however, our work essentially 
differs from theirs in terms of modells and goals: (1) no measurement noise is allowed in their model. 
They consider an exactly low-rank matrix, while we deal with an approximately low-rank matrix which 
is the case for sensory data matrices; (2) full observations of all entries are required. They target at the 
possibility of separating two matrices while we aim to recover the low-rank matrix. 


III. System Model and Problem Statement 

A. Notations 

We go over the notations and preliminaries. Throughout the paper, N denotes the number of nodes in 
the wireless sensor network, T denotes the number of slots in the sensing period of interest, [A^] denotes 
the set {1, 2,..., A^}, R G denotes the data matrix of interest, L G denotes the low-rank 

matrix of interest, and S G R^^^ denotes the anomaly matrix of interest. Let 12 C [A^] x [T] denote the 
support set of S (i.e., the index set of the nonzero entries), O C [A^] x [T] denote the index set of the 
observed entries. R* denotes the transpose of R. The operator Vo{R) projects matrix R onto its index 
set O, i.e., the (n,f)-th entry of Vo{R) is equal to Rnt if {n,t) G O and zero otherwise. 

The ^Q-norm is defined as ||5'||o = |supp(S')|, where supp(S') = {{n,t) : Snt / 0} denotes the 
support set of S and |supp(5)| denotes the cardinality of supp(S'). The ^i-norm is defined as ||5||i = 
Ylt=i ^bs(S'„t) where ahs{Snt) denotes the absolute value of Snt- The ^ 2 -norm of vector A G 
7?.”* is defined as ||A ||2 = ^j- The Frobenius norm of matrix R is defined as ||f?||F = 

\/Y^n=iYlt=i^nt- Lef L = UEV* = Y^\=iCriUiV* denote the singular value decomposition (SVD) 
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of L, where r is the rank, cji > ... > (7^ > 0 are the singular values, and U = [ui,..., 1^ = [vi, ...,Vr\ 

are the matrices of left- and right-singular vectors, respectively. The nuclear norm of L is defined as 


= Given a matrix pair X = (L, S), let 


More notations are needed in our proofs as in ll20l - l22ll . Let T denote the suhspace generated hy 




If A(||L||2 +||S||2)l/2, 


matrices with the same column space or row space of L: 

T={UQ* + RV*\Q,Re. 




( 1 ) 


and Vt he the projection operator onto the suhspace T. Define the projection operator Vt x Rq. '■ 
{L,S) ^ {VrL,VnS). Define the suhspace T = {{Q,Q)\Q G and = {{Q,-Q)\Q € 

I^nxn}, jgj- denote their respective projection operators. 


B. Network Model 

We consider a wireless sensor network consisting of N sensor nodes and a sink. Sensor nodes are 
scattered in a target field fo monifor physical condifions such as femperafure, humidify, illuminafion, gas 
concentration, magnetic strength, etc. They report sensory readings to the sink periodically over a given 
time span. 

The monitoring period is evenly divided into T time slots, denoted as {0,1,..., f,..., T — 1}. Each 
sensor node generates a record in each slot. A record at a sensor node includes the sensor readings, node 
ID, time stamp, and location (longitude and latitude). Its format is: 


Record: 

sensor readings 

node ID 

time stamp 

location 


Let Ri^t denote the sensory reading of the i-th node at slot t. For each physical condition, the sensor 
readings generated hy all sensor nodes can he represented hy a matrix R G as follows: 

i?0,0 Ro,t ... Ro,T-1 

Ri,o ■■■ Ri,t ■■■ Ri,t-i 


R = 


Rn- 1,0 ■■■ RN-l,t ■■■ Rn-1,T-1 

where rows denote nodes, and columns denote time slots. 


( 2 ) 


C. Measurement Model 


It is widely believed that sensory data exhibit strong spatio-temporal correlation il8l ? ]. Compressive 


sensing theory il2L 


1311 introduces the general notion of “low-rank” to model this characteristic. In this 
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paper, we assume that the actual sensory data of a physical condition is an approximately low-rank matrix 
L G as in IhUllll. which means that most of its energy is captured hy its rank-r approximation. The 


observed sensory reading matrix R was generated hy corrupting the entries of L with measurement noise 
and anomaly readings. We model measurement noise as additive small entry-wise noise, and anomaly 
readings as additive gross sparse errors, respectively, i.e., Z € with ||2 ^||f < <5 for some (5 > 0, 

and S G with ||5||o < k. It is reasonable to assume that the number of anomaly readings is 

relatively small, compared with the size of the sensory data matrix, i.e., k <C NT. Let C [A^] x [T] 
denote the support set of S, i.e., the index set of the nonzero entries. We have the following measurement 
model: 

R = L + S + Z. (3) 


D. Problem Statement 


Sensor nodes cooperate with each other to transmit packets back to the sink via multi-hop wireless 
communication. Two major factors lead to massive missing entries: (1) in each hop, poor link quality 
results in decoding failures at the receiver node; (2) along the multi-hop path, packet collisions are 
unavoidable since wireless communication utilizes unregulated media access. 

Therefore, the sink receives an incomplete measurement matrix M. The data loss rate can be as high 
as 5% ~ 95% as shown in Fig. |4] Section |Vll Suppose the collected entries are indicated by the set 
O C [iV] X [r] and O has size m. We assume that the missing entries are randomly distributed, either 


uniformly random or non-uniform random. Please refer to 112011 for more mathematic details about allowed 


distributions. The data collection process can be represented as: 


M = Vo{R), (4) 

where the operator Vo{X) projects X onto the index set O. 

To estimate the low-rank matrix L from the partial observations M, a direct conceptual solution is: 
to seek a L with the lowest-rank that could have generated the data matrix R, subject to the constraints 
that the gross errors are sparse and the entry-wise errors are small. We formulate the LS-Decomposition 
problem as follows: 


Problem 1. (LSD) Assuming R = L + S + Z and given its partial observation M = Vo{R) on index set 
O, where L,S,Z are unknown, but L is known to be low-rank, S is known to be sparse, and ||.Z^||f < ^ 






for some <5 > 0, recover L. The Lagrangian formulation is: 

(I/,5) = mm rank{L) + X\\S\\o, 

(L,S} 

s.t. \\Vo{R-L-S)\\f<5, 

where we choose A = Xjy/n, and L, S are estimates of L, S, respectively. 
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(5) 


IV. Existing Results for LS-Separation 

A special case of the LSD problem (|5]l is when we have full observation of R, then (IS) is reduced to 
the following LS-Separation problem studied in i20l - l22ll . In this section, we summarize the conditions 
for unique solution and existing results. 


Problem 2. (LSS) Assuming R = L -\- S -\- Z and given full observation of R as O = [A^] x [T], recover 
the components L and S. The Lagrangian formulation is: 


(I/,5) = min rank{L)X\\S\\o, 

{L,S} 


( 6 ) 


s.t. < (5. 

The LSS problem ® is a highly nonconvex optimization problem and no efficient solution is known, 
since (l6]l subsumes both the low-rank matrix completion problem and the ^o-miniinization problem. Both 
of them are NP-hard and hard to approximate 12911 . To obtain a tractable optimization problem, one can 
relax the LSS problem ® ^ placing the iQ-norm with the £i-norm and the rank function with the 


nuclear norm as in [12, 




24 


2611 . yielding the following problem: 


Problem 3. The convex surrogate of the LSS problem ® is: 

(L,5) = min ||L||* + A||5||i, 
{L,S) 

S.t. \\R- L-S\\f <6 


(7) 


A. Conditions for Unique Solution 

Before analyzing the optimality of problem ([7]), one should answer the following a basic question 
first: When is LS-Separation possible? At first sight, one may think that the objective of problem (|5]) 
is impractical. Lor instance, suppose the matrix R is equal to eie^ (ei is a canonical basis vectors, 
the resulting matrix eie^ has a one in the top left comer and zeros everywhere else), then since R is 
both sparse and low-rank, how can we decide whether it is low-rank or sparse? To make problem (|5]) 
meaningful, we need to impose that the low-rank component L is not sparse, and the sparse component 
S is not low-rank. 
























1) The Low-Rank Matrix is Not Sparse: Let L = ITEV* = denote the singular value 

decomposition (SVD) of L G where r is the rank, cji, ...,ar are the singular values, and U = 

[ui, ...,Ur],V = [vi,...,Vr\ are the matrices of left- and right-singular vectors, respectively. 


The incoherence condition introduced in 112, 


1311 asserts that for small values of parameter p, the 


singular vectors are reasonably spread out, namely, the low-rank matrix L is not sparse. It states that: 

\UV 


max ||[/*ejlP < —, 
. M *11 - 


NT/* ii2 / h-r 
max ||1/ Cill < —, 


oo ^ 


pr 


NT' 


( 8 ) 


where Cj’s are the canonical basis vectors. 

2 ) The Sparse Matrix is Not Low-Rank: Another identifiability issue arises if the sparse matrix S G 
^NxT low-rank. For instance, all the nonzeros entries of S lie in a column or in a few columns, 
suppose that the columns of S happens to be the opposite of those of L, then it is clear that we would 
not be able to recover L and S by any method whatsoever since R = L-ir S would have a column space 
equal to, or included in that of L. To avoid such pathological cases, we assume that the support of sparse 


component S is selected uniformly at random among all subsets of size k as in i20l - l22ll . 


B. Existing Main Results 


The main result of 12211 is that under the above conditions, ([7]) gives a stable estimate of L and S. 


Lemma 1. / l22l/ Suppose that L obeys (H]) and the support of S is uniformly distributed. Then ifrank{L) < 
Prnp~^{\.ogn)~‘^ and |n| < ps'n? with Pr,Ps > 0 being sufficiently small numerical constants. Assume 
IIT’oT’tII ^ 1/2, A < 1/2, let X = {L,S) be the solution to ([71). Then with high probability, X satisfies: 

\\X -X\\f = IIL + 5-L- S||f < (8V5n +V2)5, (9) 

where n = max(A^, T). 

We can see that ([7]) is simultaneously stable to small entry-wise noise and robust to gross sparse 
anomalies. Note that the error bound for each entr y is proportional to the noise level 5. As for “with 


high probability”, there is no exact result while 11201 - 12211 proved that the probability is at least 1 — cn 
where c is a numerical constant. 


V. Conditions and Performance Guarantees for LS-Decomposition 
A. Problem Analysis 

Following the framework in Section jlVl we relax problem (|5]) by replacing the ^o'liorm with the £i- 
norm and the rank function with the nuclear norm. Then, we provide a condition for unique solution and 
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the corresponding theoretical guarantees. 

Problem 4. The convex surrogate of the LSD problem (O is: 

(L,5) = min ||L||* + A||5||i, 
{L,S) 

s.t. \\Vo{R-L-S)\\F<d 


( 10 ) 


B. Partial Observation 

However, those two conditions in Section|IV]do not suffice to guarantee unique solution for problem ([5]). 
For example, all {{L, S), {L, Si),{L, Sn)} are optimal solutions as long as Vo{Si) = ■■■Vo{Sn) = 
Vo{S). To avoid such predicament, we set the following artificial selling for ([5]) and (jT) fhroughout fhe 
paper: 

ll’^C)-L(S)ll^ = \\'Po^{Si)\\F = = \\'Po^{S)\\f = 0. (11) 

This is quite reasonable because our aim is to recover the low-rank matrix L and we do not want to 
recover the anomaly matrix. 

Remark 1. Partial observation of R provides partial recovery of its sparse component S, i.e., only the 
corresponding subset of entries are observable. 

C. Theoretic Guarantees 

Under the above conditions, we have the following theoretical results: 1) problem (|5]) is possible by 
solving the convex program (ITOl ). and 2) the precise closed form of the approximation bound. The detailed 
proofs are given in the appendix. 

Theorem 1. Suppose that L obeys (IS]) and the support of S is uniformly distributed. Then if rank(L) < 
Prnp~^{log n)~‘^ and |U| < with Pr,Ps > 0 being sufficiently small numerical constants. Assume 
IIT’oT’t'II ^ 1/2, A < 1/2, let X = (L, S) be the solution to diOD . Then with high probability, X satisfies: 

\\X - X\\f = \\L +S - L- S\\f < {8ns/AOn + 40n/p + 5 + V2)6, (12) 


where n = max(A^, T). 



II 


TABLE I 

Data sets for the compressibility characterization. 


Data Sets 

Environment 

Nodes 

Time period 

Resolution 

Physical conditions 

Size 

IntelLab 

Indoor 

54 

Feb.28 ~ Apr.5, 2004 

30 seconds 

Temperature, light, humidity 

54 X 500 

GreenOrbs 

Forest 

326 

Aug.03 ~ 05, 2011 

60 seconds 

Temperature, light, humidity 

326 X 750 

NBDC CTD 

Ocean 

216 

Oct.26 ~ 28, 2012 

10 minutes 

Temperature, salt, conductivity 

216 X 300 


VI. Revealing Anomaly Readings in Real-World Wireless Sensor Networks 
A. Data Sets 

Table I lists the sensory data matrices used in this paper. These data sets are collected by the IntelLab 


Hi 


151], GreenOrbs lllhll . and NBDC-CTD Ill7ll projects, which are deployed in indoor, mountain and ocean 


environments, rppectively. 

IntelLab lisll : is deploy in the Intel Berkeley Research lab from Feb. 28th to Apr. 5th, 2004. There 
are 54 Mica2 nodes places in a 40m x 30m room. The nodes are set to send back a packet every 30 
seconds. This data set includes in total 2, 313,682 data packets. 


GreenOrbs lllhll : is deployed on the Tianmu Mountain, Zhejiang Province, China. In total, there are 


326 nodes deployed. Nodes are set to transmitting packets back to the sink node every minute. We use 
the data collected in Aug. 3nd ~5th, 2011, including 308,928 data packets. 

NBDC CTD iHj: by the National Oceanic and Atmospheric Administration’s (NOAA) National Data 
Bouy Center (NDBC). CTD (Conductivity, Temperature, and Depth) is a shipboard device consisting of 
many small probes. Nodes are set to report every 10 minutes. We use 8,107 data packets collected during 
Oct. 26th ~ 28th, 2012. 


B. Data Loss Rate 

Due to unreliable wireless communication and packet collision, the raw data matrices at the sink node 
are incomplete. To quantify the data loss problem, we define a loss rate for each node as ri = CijT, i £ 
{1, 2,n}, where Ci is the number of lost readings of the f-th sensor node. The empirical cumulative 
distribution function of data loss rates is shown in Fig. 4. The data loss rates in real-world sensor networks 
are quite high, being 5% ~ 95%. 
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(a) Intel Indoor 


(b) GreenOrbs 


(c) NBDC CTD 


Fig. 3. Overview of the sensor network deployment for the Intel Indoor, GreenOrbs and NBDC CTD projects. 
Empirical CDF 




Fig. 4. (Left) The data loss rates of the IntelLab, GreenOrbs, and NBDC-CTD projects, deployed in indoor, mountain, and 
ocean environment, repectively. (Right) The percentage of anomaly readings, determined by the LS-Separation I3l . G,I,N are 
short for GreenOrbs, IntelLab, and NBDC CTD, while H, L, T, C, S are short for humidity, light, temperature, conductivity, 
and salt. 


C. Low-Rank Property with Anomaly Readings 

First, we construct full data matrices listed in Table. I, serving as the ground truth for observations 
and evaluations. Set a time window which is ten times the monitoring resolution for each data set. For 
each time slot, we randomly select one of the observed readings. 

For each raw data matrix, we apply singular value decomposition (SVD) to examine if it has a good 
low-rank approximation. The metric we use is the faction of total energy captured by the top portion of 
singular values. For example, the top 10% portion of singular values for the IntelLab matrices means 


Empirical CDF of IntelLab 



Empirical CDF of GreenOrbs 


• - Raw H 
'■'-'Raw L 
'■'■'Raw T 
^—Processed H 
^—Processed L 
^—Processed T 


Empirical CDF of NBDC CTD 


0 10 20 30 40 50 60 70 80 90 100 

Top Portion of Singular Values * 100 {%) 



Fig. 5. The CDF of Singular Values for IntelLab, GreenOrbs, and NBDC CTD: svd of the raw data matrix, compared with 
processed data matrix. H, L, T, C, S are short for humidity, light, temperature, conductivity, and salt. 
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the largest 5 singular values. We calculate ( *^0 / ( Si=i ) > where (Tj is the z-th largest singular 


value and measures the total energy of the data matrix. Note that 1 — j / 

is the relative approximation error of the best rank-5 approximatioi^ with respect to the Frohenius norm. 

ifl], 


Next we process the raw data matrices with LS-separation 112011 


which identifies the anomaly 


readings and returns a low-rank matrix. Then, we apply SVD on the processed low-rank matrices in a 
similar way. 

Fig. |4] shows the ratios of anomaly reading for each data matrix, ranging from 2.77% to 5.51%. Fig. 
| 5 ] shows the CDFs of singular values for the raw data matrices and the processed data matrices. We can 
see that hy excluding the anomalies, the processed data matrices has strengthened low-rank structures. 
For the IntelLah project, 18% singular values capture 90% energy of the raw temperature matrix while 
the ratio is reduced to 5%; for light, 22% is reduced to 13%; and for humidity, 10% is reduced to 5%. 
Similar observations hold for the GreenOrbs and NBDC CTD projects. 


VII. Accelerated Proximal Gradient Approach 
A. General Accelerated Proximal Gradient Algorithm 


Accelerated Proximal Gradient algorithms (APGs) 12711 . with the general flow shown in Table II, solve 
problems with the following general form: 


min F{X) ^ pg{X) + f{X), 

-X-GH 


(13) 


where PL is a. real Hilbert space with a norm 11 • 11, gr is a continuous convex function, / is convex and 
smooth with Lipschitz continuous gradient: ||V/(Xi) — V/(X 2 )|| < Lf\\Xi — 2f2|| where V/ is the 
Freechet derivative of /. 

APGs repeatedly minimize a proximal operator Q{X,Y) to F{X), defined as: 

Q{X,Y) 4 f{Y) + {Xf{Y),X -Y) + ^\\X-Y\\‘^ + pg{X). 

Since f{X) is convex, for any Y, Q{X, Y) upper bounds F{X). If we define G = Y — f{Y), with 

simple transform and ignore the constant terms, then we have: 


argmin Q(V, F) = argmin pg{X) + ^\\X - GW^. 


(14) 


We repeatedly set = arg minx <3(27,14), as in Table II. It is proved ll28ll that setting 14 = 

tk-l — t ! v_ ^ +2 ^ +2 


+ 


tk 


-{Xk — Xk-i) if — tk+i < t'l results in a convergence rate 0(1/A:^). 
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TABLE II 

General Proximal Gradient Algorithm 


1: while not converged do 

2: n ^ X;, + - Xfc_i); 

3: G^n-^V/(n); 

4: Xk+iaxg-mirix iJ,g{X) + ^\\X - Gk\\^-, 

5: 4+1^ fc^fc + 1; 

6: end while 


TABLE III 

Accelerated Proximal Gradient Algorithm 


Input: 

1: I/O, Iz-i •!— 0; S'o, S'-! <— 1; to, t-i t— 1; -l— 5/ro; k=0. 

2: while not converged do 


3: 


Lk + {Lk Lk-i)- 

4: 


-Sk+'-^^^{Sk-Sk-i). 

5: 

Gk -I- 

- Yi: - \VoX^ + - R) 


6: ([/, 5, F) ^ svd{Gi), Lk = US^k [S]II*. 

7: Gl ^ Yi - \Vo{Y^ + if - R)- 

8: Sfc+i-I—5^ [Gf ]. 

2 

9: tfc+i -I- max{gk,'P), fe ■<- A: + 1. 

10: end while 

Output! L S — 5*/c- 


Robust Data Recovery by Accelerated Proximal Gradient 
Consider the augmented Lagrangian function of (ITOl ). 

argmin/F(X) ^ /i||L||* + f,X\\S\\i + h\Vo{R -L- .S)|||. (15) 

z 

Different from the conventional APGs, T-L is the space of same-sized matrices endowed with the 
Frohenius norm || • ||f, our iterates are ordered parrs (L^, SC) G x g{XC = /r||Lfc||* -|- 

/rA||5fc||i, and the Lipschitz constant Lf = 2. 
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Write Gk — {G^,Gf) E x and let USV* be the singular value decomposition of G^. 


Following the framework of APGs, we have: 

Lfc+i = US^[S]V*, Sk+i = S^[GI], 
where 5[-] is an element-wise threshold operator , defined as: 

f 

X — e, if X > e 

= 


(16) 


(17) 


X + e, if X < —e ■ 

0, otherwise 

Therefore, we construct our accelerated proximal algorithm as in Table III. Please refer to the appendix 
for the proofs of the following two theorems. 

Theorem 2. Let F{X) = F{L,S) = /r||L||* + /iA||5||i + ^\\Vo{R-L- 5)||^. Then, for all k > I, 
our algorithm achieves the following converge rate: 

A\\Xi-X*\\l 


F{Xk)-F{X*) < 
where X* is optimal solution to diOD . 


A:2 


(18) 


Lemma 2. Let F{X) = F{L, S) = ^||L||* + |uA||S'||i + ^\\Vo{R ~ ~ 5')!I^- Our algorithm converges 

to its global optimal. To achieve F{Xk) — F{X*) < e, our algorithm requires k = 0{f/y/e) iterations. 


VIII. Evaluation 


A. Evaluation Methodology 


Our evaluations are based on the real-world data sets from the IntelLab 11511 . GreenOrbs 11611 . and 


NBDC-CTD ilTII projects. The sensors in each project measure three types of physical conditions. Given 


the ground truth data described in Table I, we randomly drop entries and then compare the recovered 
data matrices with the ground truth. Under different data loss rate, we measure the recovery errors in 


terms of Normalized Square Error (NSE), defined as following: 


-Le) 


NSE = 


(19) 


l^i=l l^j=l ^ij 

where L is fhe estimated low-rank matrix, L is the ground truth. NSE is widely used in data interpolation 
and parameter estimation. 

We compare our LS-Decomposition with the standard matrix completion and the smooth-regulated 
matrix completion. To further understand the gap between LS-Decomposition and the optimal recovery 
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performance, we construct an oracle solution which gives us information about the support of 5 and 


the row (column) space T o 


L. 


(1) Oracle Solution (OS) ll22ll serves as the optimal solution. Assume that we know the support Q of 


S and the row and column spaces T of L, which can he inferred hy performing LS-Separation (problem 
iH as done in Section V) on the ground truth data matrices. It estimates L and S as the solution Lcrade 
and Soracie to the following lease squares problem: 


{^oraclei ^oracle) mill \\R L 

(L,5) 

s.t. LgT^SgQ. 


( 20 ) 


(2) Matrix Completion (MC) il2llil3llil4ll : the standard matrix completion solves the following opti¬ 
mization problem: 


L = min 
L 


( 21 ) 


s.t. \\R — L\\p < 6. 

(3) SRMF ll^: it uses Sparsity Regularized Matrix Factorization that leverages both low-rank and 
spatio-temporal characteristics, as follows: 


L = min ||L||*-|-A5(L), 
L 

s.t. \\R — L\\f < 6, 


( 22 ) 


where S{-) is the smooth term, and A is the weight to balance the low-rank term and the smooth 
term, which is usually determined by experiments. We find A = 0.01 gives the best performance in our 
simulations. S{L) is defined via the diversity of matrix horizontal and vertical difference: 

S{L) = \\V^{L)\\l + \\Vy{L)\\l., (23) 


where VxiL) is an N x {T — 1) matrix representing the horizontal difference of L with element in form 
of T>x{hj) = L{i,j -|- 1) — and T>x{L) is an {N — 1) x T matrix representing the horizontal 

difference of L with element in form of T>y{i,j) = L{i + l,y) — L{i,j). 


B. Performance Results 

We randomly drop entries with data loss rate 10%, 20%, ..., 90%, i.e., with sampling rate 90%, 80%, 
..., 10%. Using the remained data as input, we apply the above four schemes to perform data recovery. 
For the IntelLab, GreenOrbs, and NBDC CTD projects, we test each physical attributes separately. We 
conduct 10 runs for each case and report an average over these 10 runs. 
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Fig. 6. Comparisons for humidity, light, 


GreenOrbs Humidity 



temperature of the IntelLab project. 


GreenOrbs Light 




Fig. 7. Comparisons for humidity, light, temperature of the GreenOrbs project. 


From Fig. 6, 7, and 8, as the sampling rate increases (or the data loss rate decreases), the recovery 
errors for all the four algorithms decrease. LS performs much better than MC and SRMF which do 
not take anomaly readings into consideration. Generally, our scheme achieves recovery error < 5% for 
sampling rate > 50% and almost exact recovery for sampling rate > 60%, while MC and SRMF have 
error 10% ~ 15% at sampling rate 90%. Among all projects, the recovery errors of LS for IntelLah are 
lower than the other two, this is because the data matrices of InteLab have lower rank. More accurately, 
lower ratio of rank to matrix dimension, as shown in Fig. 5. 

Fig. 6 shows the recovery performance for the IntelLab project. For sampling rate 20%, 30%, 40%, 
and 50%, SRMF is worse than MC, since SRMF tries to fit the anomaly readings into a smooth surface. 



NBDC CTD Conductivity 


0 10 20 30 40 

Sampling Rate * 100 (%) 



NBDC CTD Salt 
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Fig. 8. Comparisons for conductivity, salt, temperature of the NBDC CTD project. 
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The recovery errors of MC and SRMF do not decrease further with more samples when sampling rate 
is larger than 60% , LS and OS converge to zero recovery error since LS and OS take anomaly readings 
into consideration. Among the three physical conditions (humidity, light and temperature), the recovery 
error for humidity is the lowest for each case, because the humidity matrix has both lowest rank, as 
shown in Fig. 5, and lowest portion of anomaly readings, as shown in Fig. 4. Although the temperature 
matrix has relatively low rank too, but it also contains more anomaly readings. 

In Fig. 7 and 8, similar results hold for the GreenOrbs and NBDC CTD projects. Compared with 
IntelLab data matrices, we observe bigger gaps between MC/SRMF and LS/OS. The reasons are: (1) 
GreenOrbs and NBDC CTD have lower ratios of rank to matrix dimension, as shown in Fig. 5; (2) 
the dimensions, i.e., min{N, T)), of data matrices in GreenOrbs and NBDC CTD (i.e., 326 and 216, 
respectively), are much bigger than those in Intellab (i.e., 52), as shown in Table I. For larger matrices, 
anomaly readings have weak influences. Note that the temperature of NBDC CTD has much lower 
recovery error than conductivity and salt, the reasons are: (1) the the rank of the temperature matrix is 
much lower, as shown in Fig. 5, and (2) the portion of anomaly readings is also lower. 

IX. Conclusion 

In this paper, we investigated the data recovery problem with considerations of massive missing 
entries, measurement noise, and anomaly readings. An LS-decomposition approach was proposed, which 
decomposed a given partially observed corrupted data matrix into a low-rank matrix and a sparse matrix. 
An accelerated proximal algorithm was devised to solve this problem. Theoretical results were given 
to guarantee the optimality and the converge rate. Our scheme is more robust than standard matrix 
completion methods, thus is expected to have better usability. 

Appendix 

A. Proof of Theorem |7] 

Our proof uses three crucial properties of {L,S): 

• Since {L, S) is a feasible solution to (fTOl) . we have: 

||L||* + A||5||i < ||L||* + A||5||i. 

• The triangle inequality implies that: 

\\Vo{L + S - L - S)\\f < \\Vo{L + S-R)\\f + \\Vo{L + S-R)\\f < 25, 


(24) 

(25) 


since both (L, S) and (L, S) are feasible solutions to (fTOl) . 
The condition we set in (fTTl) to ensure a unique solution: 
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II'Pc)-l(5)||f = ||'Pc)-L(5o)ll^’ = 0- (26) 

The first two properties imply that X = {L,S) is close to 26 = {L,S). Set X = X + H, where 
H = and write = Vr{H), = Vy^{H) for short. Our aim is to bound ||if||F. which 

can be expressed as: 

II^IIf = WH^Wl + = II^^IIf + IK^r X Vn)iH^")\\l + WiVr^ X Vn^){H^")\\l. (27) 

It suffices to bound each term in the righ-hand-side of (l27l) . 

A. Bound the first term of dlTll . 

1177^112^ = + Hs)/2\\l + \\{Hl + 77s)/2||^ = ^\\Hl + Hs\\l 

= ^ (llT’oiffL + ffs)fF + W-Po^Hl + Hs)\\l) (28) 

= 1{\\Vo{Hl + Hs)\\1 + \\Vo4Hl)\\f) 

where the last equation uses a condition drived from (l26l) . i.e., \\Vo‘={Hs)\\‘f — 0- ^rm of 

(|2^ . we already have ||T’c>(i7L + Hs)\\% < 4(5^ from (|25]) . Then, we bound the second term of (|2^ : 

\\ro4HL)\\l = WVTiPo^HLml + \\Vr^{Vo^{HLmF, (29) 


and it suffices fo bound each ferm in the right-hand-side. 


We start with the second term of ([29b . Let W be a dual certificate as in 120-22]. Then, A = UV*+W 
obeys ||T’ 7 -j-(A)|| < 1/2 and ||Pr 2 J-(A)||oo < A/2. We have: 

\\L + Hl\U > \\L + Poo{HL)\U-\\roiHL)\U (30) 


and ( i26ll ) 


\L + Vo4Hl)\U > \\L\U + il-\\rr^{A)\\)\\Vr^iro4HLm*- 


(31) 


Therefore, with ||P 7 -j-(A)|| < 1/2 and \\L + TfiH* < ||7)||*, combining (l30b and (l3Tb we have: 

\\L\U > \\L\U + l\\Pr-iPo4HLm* - WPoiHL)]]. 

\\VT^{Vo‘iHLm* <2\\VoiHL)\U. 


(32) 

(33) 


Since the nuclear norm dominated the Frobenius norm ||T’r^(T’c)=(77L))||F < \\PT-^{Po‘iHL))\\*, we 
have 


\\Vr^iVo4HLmF < 2\\VoiHL)\U < 2V^\\VoiHL)\\F 


( 34 ) 
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where the second inequality follows from the Cauchy-Schwarz inequality. We know that 

\\roiHL)\\F < WVoiHL + Hs)\\f < 26, (35) 

therefore, 

||Pr^(Po»(i?L))||F<4v^<5. (36) 

We then bound ||7^r(^C)‘^(^L))l|F in the first term of (l29ll. Observe that the assumption T’-j-VqT’j- ^ 
(p/2)X together with = Vt-,Vq = Vo, gives: 

\\VoVr{Vo<^{HL))\\l = {VoVt{Voc{Hl)),VoVt{Vo4Hl))) 

= {VoVt{Voc{Hl)),Vt{Voc{Hl))) 


(37) 


But since 


we have 


> ^l|7’r(7’o=(i7L))llF- 


Vo{Vo) = 0 = VoVr{Vo‘={HL) + VoV^^ {Vo'={Hl)), 


\\VoVt{Vo^{Hl))\\f = \ \VoVr^{Vo^{HL))\\F 
< \ \Pt^{Vo^{Hl))\\f- 

Hence, (iJTl) and (1^ together give 


\\Vr{Vo<^{HF)WF < -\\Vt-{Vo^{Hl))\?f. 


Combining 


, we have: 


< 25'^ + 


2 ' 2 + 


P 


(38) 


(39) 


(40) 


(41) 


B. Bound the third term of dlTll . For any matrix pair X = (L, S), we define ||-^||o = ||L||* + A||5||i. 
We have 

\\X + H\\^ > ||X +77^-11^-1177^11^, 

||X + > ||X||^ + (3/4 - \\Vt4A)\\)\\Vt4hI^)\U + (3A/4 - ||Pf,.(A)|U)||Pf,.(77r)||, 

> ll^lb + \i\\rT-{Hr)\U + X\\Vn4H^s")\U 


where the second inequality follows from Lemma 5 of 12211 . 

Therefore, we have 

\\VT^{Hr)\U + X\\Vn4Hf)\U<A\\H^\\^. 

For any matrix Y E we have the following inequalities: 

IIa^IIf < lli^ll* < v^IIa^IIf, ^IIa^IIf < A||y||i < v^||y||F, 

\/n 


(42) 


( 43 ) 
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where we assume A = Then, 

y/n 

\\{Vr- X < WVr-iHDWr + ||Pn-(^r)llF 

<\\Vr-{Hr)\U + XV^\\Vn4H^s")\U 

< 4 v^||i^^|b = 4 v^(ll^^[ll* + ll^slli) 

< An{\\Hl\\F + II^sIIf) = 4.V2n\\H^\\F 

C. Bound the second term of dlTll . By Lemma 6 of 1221], we have 


\%>^\\{VTxVn){H^ )\\i. 


WVriVT X rn){H^ 

But since = 0 = Vri^r x Vn){H^^) + VriVr^ x we have 

\\Vr{VT xVn){H^ )\\f = W'Pri'Pr^ x Vq±){H^ )||i7’ 

<\\{Vr-xVn^){H^")\\F. 

Combing (1451) and (1451) . we have 

\\{Vt X Vn){H^")\\l < mVr- x 


together with (l27l)(l4Tl)(l44l) . leads us to the final result: 

WHWl < 5 X 16 X 2n2||i2^|||, + 

= 320n^i5^(8n + 8n/p + 1) + ^ ^ 8n5‘^ + 2i5^. 

P 

Therefore, we obtain the desired result, 

||i2||F < (8n\/40n + 40n/p + 5 + V2)6. 


(44) 


(45) 


(46) 


(47) 


(48) 


(49) 


B. Proof of Theorem |2] 

First, we need the following three key inequalities. 

Lemma 3. Let { 0 ^, 6 ^} be positive sequences satisfying 

ttk — Ofc+i > bk+i — bk, VA: < l,with oi + 61 < c, c > 0. (50) 

Then, < cfor every k > 1. 

By deduction, we can easily have: 

Lemma 4. The positive sequence {f^} generated in our algorithm via < -— with t-i = 1 

satisfies > (k + l)j2for all k>l. 
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Let Vk = F{Xk) — F{X*), Uk = tkXk — {tk — F)Xk^ — X*. Similarly as is proved in llisll . we have: 


Lemma 5. The sequence {Xk'\ generated in our algorithm satisfies: 

^‘k'^k ~ tkj^iVk+l > ||Ufc_|_i||p — llrr/cIlF- 


(51) 


Now, we are ready to prove our theorem. Let = t^Vk, bk = c = \ \Xkg — then from 

Lemma |4l 

®fc+i — bk-\-\ bk- (52) 

Assume that Ofc + < c holds true for k > 1, comhining Lemma |50l we obtain that: 

tlvk<\\Xk-X*\\l, (53) 

which combined with Lemma |4] yields 


F{Xk)-F{X*) < 


4||Xi 


(54) 


C. Proof of Lemma^ 

From Theorem |2l we know that: 


lim F{Xk) - F{X*) ^ 0, 

k^oo 


which means that our algorithm converges to its globe optimal for large enough k. 
For any e > 0, to guarantee F{Xk) — F{X*) < e, then: 


A\\Xi-X*\\l 


< e, 


Ve 


(55) 


(56) 

(57) 
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